-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Resolve table names in MV query optimizer for consistent matching #27059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary: MV query optimizer fails to rewrite queries when the specified table name differs between the MV definition and the incoming query (ex: `base_table` vs `schema.base_table`). This fix resolves table references to schema-qualified names, ensuring consistent table matching regardless of how the table was specified. Reviewed By: zation99 Differential Revision: D91699496
Reviewer's GuideEnsures the materialized view (MV) query optimizer matches base tables even when one side uses schema-qualified table names by normalizing table references, and adds tests covering various combinations of qualified and unqualified table names. Sequence diagram for MV base table matching with normalized table namessequenceDiagram
actor OptimizerCaller
participant MaterializedViewQueryOptimizer
participant MaterializedViewInfo
participant MaterializedViewUtils
participant Metadata
OptimizerCaller->>MaterializedViewQueryOptimizer: visitRelation(node, context)
MaterializedViewQueryOptimizer->>MaterializedViewInfo: getBaseTable()
MaterializedViewInfo-->>MaterializedViewQueryOptimizer: Optional baseTable
alt baseTable is present
MaterializedViewQueryOptimizer->>MaterializedViewUtils: resolveTableName(node, session, metadata)
MaterializedViewUtils->>Metadata: createQualifiedObjectName(session, node, tableName)
Metadata-->>MaterializedViewUtils: QualifiedObjectName
MaterializedViewUtils-->>MaterializedViewQueryOptimizer: resolvedRelationFromQuery
MaterializedViewQueryOptimizer->>MaterializedViewUtils: resolveTableName(baseTable, session, metadata)
MaterializedViewUtils->>Metadata: createQualifiedObjectName(session, baseTable, tableName)
Metadata-->>MaterializedViewUtils: QualifiedObjectName
MaterializedViewUtils-->>MaterializedViewQueryOptimizer: resolvedRelationFromMV
alt resolvedRelationFromQuery equals resolvedRelationFromMV
MaterializedViewQueryOptimizer-->>OptimizerCaller: materializedView
else mismatch
MaterializedViewQueryOptimizer-->>OptimizerCaller: IllegalStateException
end
else baseTable not present
MaterializedViewQueryOptimizer-->>OptimizerCaller: IllegalStateException
end
Class diagram for updated MV table name resolutionclassDiagram
class MaterializedViewUtils {
+Expression convertMaterializedDataPredicatesToExpression(Expression expression)
+Expression convertSymbolReferencesToIdentifiers(Expression expression)
+Relation resolveTableName(Relation relation, Session session, Metadata metadata)
}
class MaterializedViewQueryOptimizer {
-Session session
-Metadata metadata
-MaterializedViewInfo materializedViewInfo
-Relation materializedView
+Node visitAliasedRelation(AliasedRelation node, Void context)
+Node visitRelation(Relation node, Void context)
}
class Relation
class Table {
+QualifiedName getName()
}
class QualifiedObjectName {
+String getSchemaName()
+String getObjectName()
}
class QualifiedName {
+static QualifiedName of(String schemaName, String objectName)
}
class Session
class Metadata
class Node
class AliasedRelation
class MaterializedViewInfo {
+Optional getBaseTable()
}
MaterializedViewQueryOptimizer --> MaterializedViewUtils : uses_resolveTableName
MaterializedViewUtils --> Relation
MaterializedViewUtils --> Table
MaterializedViewUtils --> QualifiedObjectName
MaterializedViewUtils --> QualifiedName
MaterializedViewUtils --> Session
MaterializedViewUtils --> Metadata
MaterializedViewQueryOptimizer --> Session
MaterializedViewQueryOptimizer --> Metadata
MaterializedViewQueryOptimizer --> MaterializedViewInfo
MaterializedViewQueryOptimizer --> Node
MaterializedViewQueryOptimizer --> AliasedRelation
Table --|> Relation
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 1 issue, and left some high level feedback:
- In
resolveTableName, you strip the catalog and always return aTablewith onlyschema.table; if the original relation was catalog-qualified, this could break matching for multi-catalog setups—consider preserving the catalog component in the returnedQualifiedName. - The
resolveTableNamehelper only handlesTableinstances; sincevisitRelationis called on a genericRelation, it may be safer to either enforce/validate that onlyTablenodes reach this code path or to handle/warn on otherRelationsubtypes explicitly to avoid surprising behavior.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `resolveTableName`, you strip the catalog and always return a `Table` with only `schema.table`; if the original relation was catalog-qualified, this could break matching for multi-catalog setups—consider preserving the catalog component in the returned `QualifiedName`.
- The `resolveTableName` helper only handles `Table` instances; since `visitRelation` is called on a generic `Relation`, it may be safer to either enforce/validate that only `Table` nodes reach this code path or to handle/warn on other `Relation` subtypes explicitly to avoid surprising behavior.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java:453-454` </location>
<code_context>
assertOptimizedQuery(baseQuerySqlWithTablePrefix, expectedRewrittenSql, originalViewSqlWithTablePrefix, BASE_TABLE_1, VIEW_1);
}
+ @Test
+ public void testWithSchemaQualifiedTableName()
+ {
+ String schemaQualifiedTable = SESSION_SCHEMA + "." + BASE_TABLE_1;
</code_context>
<issue_to_address>
**suggestion (testing):** Add negative tests for mismatching schemas/table names to prove we don’t rewrite when we shouldn’t
To make the optimizer behavior more robustly tested, please add one or two negative cases where the schema-qualified name resolves to a *different* table than the MV base table (e.g., `SESSION_SCHEMA.other_table` or `other_schema.base_table`). These should assert that the MV rewrite does not occur (or that the correct exception path is taken), ensuring the new `resolveTableName` logic does not over-match tables.
Suggested implementation:
```java
@Test
public void testWithSchemaQualifiedTableName()
{
String schemaQualifiedTable = SESSION_SCHEMA + "." + BASE_TABLE_1;
String originalViewSql = format("SELECT a, b FROM %s", BASE_TABLE_1);
String baseQuerySql = format("SELECT a, b FROM %s", schemaQualifiedTable);
String expectedRewrittenSql = format("SELECT a, b FROM %s", VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
originalViewSql = format("SELECT a, b FROM %s", schemaQualifiedTable);
baseQuerySql = format("SELECT a, b FROM %s", BASE_TABLE_1);
expectedRewrittenSql = format("SELECT a, b FROM %s", VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
public void testWithSchemaQualifiedTableNameMismatchingTable()
{
// MV is defined over BASE_TABLE_1; query uses a different table in the same schema
String schemaQualifiedOtherTable = SESSION_SCHEMA + "." + BASE_TABLE_2;
String originalViewSql = format("SELECT a, b FROM %s", BASE_TABLE_1);
String baseQuerySql = format("SELECT a, b FROM %s", schemaQualifiedOtherTable);
// Verify that a different table name in the same schema does NOT trigger a rewrite
assertNotOptimizedQuery(baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
public void testWithSchemaQualifiedTableNameMismatchingSchema()
{
// MV is defined over SESSION_SCHEMA.BASE_TABLE_1; query uses the same table name in a different schema
String otherSchemaQualifiedTable = "other_schema." + BASE_TABLE_1;
String originalViewSql = format("SELECT a, b FROM %s", SESSION_SCHEMA + "." + BASE_TABLE_1);
String baseQuerySql = format("SELECT a, b FROM %s", otherSchemaQualifiedTable);
// Verify that a different schema for the same table name does NOT trigger a rewrite
assertNotOptimizedQuery(baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);
```
To fully wire these tests into the existing test harness, you will also need to:
1. Ensure there is a `BASE_TABLE_2` constant defined in this test class (or adjust `testWithSchemaQualifiedTableNameMismatchingTable` to use whatever second base-table constant already exists).
2. Implement or reuse an assertion helper like `assertNotOptimizedQuery(String baseQuerySql, String originalViewSql, String baseTable, String viewName)` that:
- Runs the materialized view optimizer on `baseQuerySql` given the provided MV definition (`originalViewSql`, `baseTable`, `viewName`).
- Asserts that either:
a) The optimized query text is identical to `baseQuerySql` (no rewrite occurred), or
b) The optimizer throws the expected exception for non-rewritable queries (if that is the intended behavior).
3. If such a helper already exists under a different name/signature (e.g., `assertNoRewrite`, `assertNotRewrittenQuery`, etc.), replace `assertNotOptimizedQuery(...)` calls with the appropriate existing helper and adjust the arguments accordingly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @Test | ||
| public void testWithSchemaQualifiedTableName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Add negative tests for mismatching schemas/table names to prove we don’t rewrite when we shouldn’t
To make the optimizer behavior more robustly tested, please add one or two negative cases where the schema-qualified name resolves to a different table than the MV base table (e.g., SESSION_SCHEMA.other_table or other_schema.base_table). These should assert that the MV rewrite does not occur (or that the correct exception path is taken), ensuring the new resolveTableName logic does not over-match tables.
Suggested implementation:
@Test
public void testWithSchemaQualifiedTableName()
{
String schemaQualifiedTable = SESSION_SCHEMA + "." + BASE_TABLE_1;
String originalViewSql = format("SELECT a, b FROM %s", BASE_TABLE_1);
String baseQuerySql = format("SELECT a, b FROM %s", schemaQualifiedTable);
String expectedRewrittenSql = format("SELECT a, b FROM %s", VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
originalViewSql = format("SELECT a, b FROM %s", schemaQualifiedTable);
baseQuerySql = format("SELECT a, b FROM %s", BASE_TABLE_1);
expectedRewrittenSql = format("SELECT a, b FROM %s", VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
public void testWithSchemaQualifiedTableNameMismatchingTable()
{
// MV is defined over BASE_TABLE_1; query uses a different table in the same schema
String schemaQualifiedOtherTable = SESSION_SCHEMA + "." + BASE_TABLE_2;
String originalViewSql = format("SELECT a, b FROM %s", BASE_TABLE_1);
String baseQuerySql = format("SELECT a, b FROM %s", schemaQualifiedOtherTable);
// Verify that a different table name in the same schema does NOT trigger a rewrite
assertNotOptimizedQuery(baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);
}
@Test
public void testWithSchemaQualifiedTableNameMismatchingSchema()
{
// MV is defined over SESSION_SCHEMA.BASE_TABLE_1; query uses the same table name in a different schema
String otherSchemaQualifiedTable = "other_schema." + BASE_TABLE_1;
String originalViewSql = format("SELECT a, b FROM %s", SESSION_SCHEMA + "." + BASE_TABLE_1);
String baseQuerySql = format("SELECT a, b FROM %s", otherSchemaQualifiedTable);
// Verify that a different schema for the same table name does NOT trigger a rewrite
assertNotOptimizedQuery(baseQuerySql, originalViewSql, BASE_TABLE_1, VIEW_1);To fully wire these tests into the existing test harness, you will also need to:
- Ensure there is a
BASE_TABLE_2constant defined in this test class (or adjusttestWithSchemaQualifiedTableNameMismatchingTableto use whatever second base-table constant already exists). - Implement or reuse an assertion helper like
assertNotOptimizedQuery(String baseQuerySql, String originalViewSql, String baseTable, String viewName)that:- Runs the materialized view optimizer on
baseQuerySqlgiven the provided MV definition (originalViewSql,baseTable,viewName). - Asserts that either:
a) The optimized query text is identical tobaseQuerySql(no rewrite occurred), or
b) The optimizer throws the expected exception for non-rewritable queries (if that is the intended behavior).
- Runs the materialized view optimizer on
- If such a helper already exists under a different name/signature (e.g.,
assertNoRewrite,assertNotRewrittenQuery, etc.), replaceassertNotOptimizedQuery(...)calls with the appropriate existing helper and adjust the arguments accordingly.
Summary:
MV query optimizer fails to rewrite queries when the specified table name differs between the MV definition and the incoming query (ex:
base_tablevsschema.base_table).This fix resolves table references to schema-qualified names, ensuring consistent table matching regardless of how the table was specified.
Reviewed By: zation99
Differential Revision: D91699496
Summary by Sourcery
Ensure materialized view query optimization consistently matches base tables regardless of schema qualification in table names.
Bug Fixes:
Tests: